source-postgres: Handle NULL confirmed_flush_lsn #2436
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
In normal operation a replication slot will always have a non-null
confirmed_flush_lsn
, but we saw the other day that it is actually possible to observe a null value for that field if the replication slot is stuck in the middle of being created because it has to wait for a long-running transaction to complete.Since one major cause of replication slot recreation is when the old slot gets invalidated, and one major cause of invalidation is when a long-running transaction forces excessive WAL retention, this is actually less rare than it seems. It will happen any time a long-running transaction causes slot invalidation and the user just hits "Backfill All" without killing the transaction (assuming it didn't end on its own, of course).
Since I would really like to make these
queryReplicationSlotInfo
checks fatal errors in the near future this logic needs to be bulletproof, so we need to handle that situation.This change is